Skip to content

fix: surface registry fetch errors in GET /templates/registries (#2342)#2355

Merged
kmendell merged 3 commits intogetarcaneapp:mainfrom
GiulioSavini:fix/template-registry-surface-errors
Apr 12, 2026
Merged

fix: surface registry fetch errors in GET /templates/registries (#2342)#2355
kmendell merged 3 commits intogetarcaneapp:mainfrom
GiulioSavini:fix/template-registry-surface-errors

Conversation

@GiulioSavini
Copy link
Copy Markdown
Contributor

@GiulioSavini GiulioSavini commented Apr 12, 2026

Summary

  • Template registry fetch errors were silently swallowed in loadRemoteTemplates (return nil // Don't fail the whole group), leaving users with no way to diagnose why templates stopped loading after upgrading.
  • Root cause in v1.17.3: SSRF protection (ValidateSafeRemoteURL) performs a DNS lookup on the registry URL. In restricted network environments (no outbound DNS, private registries on RFC1918 addresses) this blocks the request with no user-visible feedback.
  • Fix: track the last fetch error per registry ID in a lightweight in-memory map on TemplateService. On success the entry is cleared. The existing GET /templates/registries response now includes lastFetchError on each registry object so users can see the exact error without checking server logs.

Closes #2342

Test plan

  • Add a template registry with an unreachable URL
  • Call GET /templates/registries — verify lastFetchError is populated with the error message
  • Fix the URL / make the registry reachable
  • Verify lastFetchError disappears from the next response after a successful fetch

🤖 Generated with Claude Code

Disclaimer Greptiles Reviews use AI, make sure to check over its work.

To better help train Greptile on our codebase, if the comment is useful and valid Like the comment, if its not helpful or invalid Dislike

To have Greptile Re-Review the changes, mention greptileai.

Greptile Summary

This PR surfaces per-registry fetch errors (previously silently swallowed) through a new in-memory registryErrors map on TemplateService, exposed via GetRegistryFetchErrors() and overlaid onto the GET /templates/registries response as lastFetchError. The approach is lightweight, correctly protected by the existing registryMu RWMutex, and non-breaking. Two minor housekeeping gaps: (1) the success path holds the local mu while acquiring registryMu, unnecessarily coupling two independent critical sections; (2) DeleteRegistry does not remove the registry's entry from registryErrors, leaving a small orphaned entry in the map.

Confidence Score: 5/5

Safe to merge; all findings are minor style/cleanup items that don't affect correctness.

The core logic is correct — the mutex usage is sound (no deadlock risk), the snapshot copy in GetRegistryFetchErrors is proper, and the DTO change is backward-compatible. Both flagged issues are P2 quality improvements rather than correctness or reliability bugs.

backend/internal/services/template_service.go — lock nesting and missing DeleteRegistry cleanup

Comments Outside Diff (1)

  1. backend/internal/services/template_service.go, line 509-526 (link)

    P2 registryErrors entry not removed on registry deletion

    When a registry is deleted, its entry in s.registryErrors is never cleaned up. With UUID IDs the practical impact is negligible, but it is a minor memory leak and would surface stale error state if the ID were ever reused. Clearing the entry here alongside invalidateRemoteCache() keeps the map consistent with the database.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: backend/internal/services/template_service.go
    Line: 509-526
    
    Comment:
    **`registryErrors` entry not removed on registry deletion**
    
    When a registry is deleted, its entry in `s.registryErrors` is never cleaned up. With UUID IDs the practical impact is negligible, but it is a minor memory leak and would surface stale error state if the ID were ever reused. Clearing the entry here alongside `invalidateRemoteCache()` keeps the map consistent with the database.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: backend/internal/services/template_service.go
Line: 557-561

Comment:
**Unnecessary lock nesting in success path**

`s.registryMu` is acquired while `mu` is already held. The registry-error cleanup and the templates-slice append are independent operations; holding `mu` across the `registryMu` lock/unlock needlessly extends the window during which `mu` is held and couples two unrelated critical sections.

```suggestion
			s.registryMu.Lock()
			delete(s.registryErrors, reg.ID)
			s.registryMu.Unlock()

			mu.Lock()
			defer mu.Unlock()
```

**Rule Used:** # Go Development Patterns

**What:** Code should p... ([source](https://app.greptile.com/review/custom-context?memory=c1082b6a-5fdc-4db8-8419-8a71ccd57636))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: backend/internal/services/template_service.go
Line: 509-526

Comment:
**`registryErrors` entry not removed on registry deletion**

When a registry is deleted, its entry in `s.registryErrors` is never cleaned up. With UUID IDs the practical impact is negligible, but it is a minor memory leak and would surface stale error state if the ID were ever reused. Clearing the entry here alongside `invalidateRemoteCache()` keeps the map consistent with the database.

```suggestion
	s.invalidateRemoteCache()

	s.registryMu.Lock()
	delete(s.registryErrors, id)
	s.registryMu.Unlock()

	return nil
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: surface registry fetch errors in GE..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Context used:

  • Rule used - # Go Development Patterns

What: Code should p... (source)

…rcaneapp#2342)

When a template registry fails to load (e.g. due to SSRF protection
introduced in v1.17.3 blocking the registry URL, or a transient
network error), the error was silently swallowed and users had no
way to diagnose why their templates stopped appearing after upgrading.

Track the last fetch error per registry ID in a lightweight in-memory
map on TemplateService. On success the entry is cleared; on failure
the error string is stored. The existing GET /templates/registries
handler now overlays this info as lastFetchError on each registry
response object, making the root cause visible in the UI/API without
touching the database schema.
@GiulioSavini GiulioSavini requested a review from a team April 12, 2026 16:11
@kmendell
Copy link
Copy Markdown
Member

kmendell commented Apr 12, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Comment on lines +557 to +561
mu.Lock()
defer mu.Unlock()
s.registryMu.Lock()
delete(s.registryErrors, reg.ID)
s.registryMu.Unlock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unnecessary lock nesting in success path

s.registryMu is acquired while mu is already held. The registry-error cleanup and the templates-slice append are independent operations; holding mu across the registryMu lock/unlock needlessly extends the window during which mu is held and couples two unrelated critical sections.

Suggested change
mu.Lock()
defer mu.Unlock()
s.registryMu.Lock()
delete(s.registryErrors, reg.ID)
s.registryMu.Unlock()
s.registryMu.Lock()
delete(s.registryErrors, reg.ID)
s.registryMu.Unlock()
mu.Lock()
defer mu.Unlock()

Rule Used: # Go Development Patterns

What: Code should p... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/services/template_service.go
Line: 557-561

Comment:
**Unnecessary lock nesting in success path**

`s.registryMu` is acquired while `mu` is already held. The registry-error cleanup and the templates-slice append are independent operations; holding `mu` across the `registryMu` lock/unlock needlessly extends the window during which `mu` is held and couples two unrelated critical sections.

```suggestion
			s.registryMu.Lock()
			delete(s.registryErrors, reg.ID)
			s.registryMu.Unlock()

			mu.Lock()
			defer mu.Unlock()
```

**Rule Used:** # Go Development Patterns

**What:** Code should p... ([source](https://app.greptile.com/review/custom-context?memory=c1082b6a-5fdc-4db8-8419-8a71ccd57636))

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

@kmendell kmendell merged commit 1bab88b into getarcaneapp:main Apr 12, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐞 Bug: Templates cannot be loaded / re-added

2 participants